Recent Results on “Approximations to Optimal 
Alarm Systems for Anomaly Detection” 

Rodney A. Martin 
NASA Ames Research Center 
Mail Stop 269-1 

Moffett Field, CA 94035-1000, USA 
(650) 604-1334 
Rodney.Martin@nasa.gov 


Abstract — An optimal alarm system and its approximations 
may use Kalman filtering for univariate linear dynamic systems 
driven by Gaussian noise to provide a layer of predictive 
capability. Predicted Kalman filter future process values and 
a fixed critical threshold can be used to construct a candidate 
level-crossing event over a predetermined prediction window. An 
optimal alarm system can be designed to elicit the fewest false 
alarms for a fixed detection probability in this particular scenario. 

I. Introduction 

Recent studies [11], [9] have served as a foundation for 
the application of a novel idea for anomaly detection that is 
derived from the collusion of decades-old theory [15], [2] with 
more recent techniques [16], [17], It was shown by Svensson 
[16], [17] that an optimal alarm system can be constructed 
by finding relevant alarm system metrics as a function of a 
design parameter by way of an optimal alarm condition. The 
optimal alarm condition is fundamentally an alarm region or 
decision boundary based upon a likelihood ratio criterion via 
the Neyman-Pearson lemma, as shown in [3], [8], This allows 
us to design an optimal alarm system that will elicit the fewest 
possible false alarms for a fixed detection probability. 

Due to the fact that the alarm regions cannot be expressed 
in closed form, one of the aims of previous studies has been 
to investigate approximations for the design of an optimal 
alarm system. Such an alarm system uses Kalman filtering 
along with temporally varying auxiliary thresholds to provide 
a layer of predictive capability. The resulting metrics can 
easily be compared to methods that incorporate auxiliary fixed 
thresholds or redlines that may also provide a similar layer of 
predictive capability, but have no provision for minimizing 
false alarms. 

The design of optimal alarm systems demonstrates potential 
to enhance reliability and support health management for space 
propulsion, civil aerospace applications, and more fundamen- 
tally to aeronautics research. Due to the great costs, not to 
mention potential dangers associated with a false alarm due 
to evasive or extreme action taken as a result of such a false 
indication, there are great opportunities for cost savings/cost 
avoidance, enhancement of overall safety, and reduction of 
technical risks of NASA programs and projects. Furthermore, 
within NASA’s space program, a missed detection can yield a 
catastrophic result of the loss of mission, crew, and/or vehicle 


that may be encountered when failing to abort in the presence 
of valid indicators. 

Even though recent studies have been limited to application- 
specific datasets, our intent is to demonstrate the utility of 
the technique from a much broader perspective. In [11] level- 
crossing events of the type most amenable to monitoring of 
control system error were used to derive the design framework 
for an optimal alarm system via the ROC curve. From the ap- 
plications perspective, we assume that the control system has 
already been designed, is robust to environmental disturbances, 
and rejects them expediently. Therefore, when unexpected 
large transients in the control system error occur, this may be 
indicative of an impending fault or change in system that may 
be cause for further diagnostic investigation. This error can be 
compared against a threshold whose selection is based upon 
the physics of the system and the margin of safety required. 
The threshold may also be determined from domain experts, 
experimentally, in flight tests, or by using statistical models. 

Alternatively, a serial architecture can be used to preprocess 
a full feature space, implicitly reducing the entire feature space 
into a univariate signal while retaining salient operational 
signatures [10]. This is performed by using the composite 
score generated by any algorithm with favorable properties as 
training data for a linear dynamic system. This is potentially a 
far more effective approach than using only a small fraction of 
the feature space by using the control system error alone. As 
such we may potentially allow for many more anomalies to 
be detected by using this paradigm. Furthermore, allowing for 
this sort of preprocessing lifts the restriction of this algorithm 
to the control systems domain, and addresses our objective of 
demonstrating the utility of the technique from a much broader 
perspective. 

II. Background 

Coincidentally, the techniques investigated as part this re- 
search have their origins in application to legacy NASA 
platforms. Rudolf E. Kalman found a unique application of his 
now very well-known Kalman filter for the Apollo program 
and more broadly to aerospace applications in general, due in 
part to finding support at NASA Ames Research Center in the 
mid 1960’s [15]. 



Although tremendously popular and ubiquitous in today’s 
aerospace systems, practical applications of Kalman filtering 
for aerospace have largely been relegated to state estimation 
for guidance, navigation, and control purposes. The study of 
auxiliary failure detection and bad data rejection algorithms 
have been developed in concert with Kalman filters [15], [18], 
[5], however the main purpose of those Kalman filters were for 
state estimation in guidance, navigation, and control systems. 

Kalman filtering has seen limited practical application ded- 
icated to system reliability and health management as related 
to exceedance of predetermined failure thresholds in aerospace 
systems. The difference in the approach that we take with 
this investigation is that the Kalman filter machinery will 
be implemented for the express purpose of system reliability 
and health management, invoking more recently available data 
mining and machine learning techniques [4], [12], [13], [10] 
to develop suitable models. 

Almost in parallel with Kalman’s breakthrough, a perhaps 
lesser known study, [6], was conducted by Ross Leadbetter and 
Harald Cramer who are pioneers in the field of the statistics of 
level crossings and extremes. This study was also funded by 
NASA, and yielded interesting results on the more theoretical 
aspects of level-crossing behavior of random processes. The 
motivation behind the work was as a result of Gertrude Cox’s 
charter to Ross Leadbetter and Harald Cramer at the time to 
“make comprehensive statistical models for manned space- 
flight systems.” They ended up supporting a small corner 
of that effort, having to do with the reliability of guidance 
systems, approaching the problem by modeling the error in 
a guidance system and declaring failure if it went out of 
prescribed limits in a mission period - leading to their work 
on crossings and extremes [7], 

All three researchers are legendary, celebrated mathemati- 
cians/statisticians in their own right; however, the work was 
never truly developed to its fullest potential for its intended 
purpose. Over the years Leadbetter’s younger Swedish col- 
leagues developed theories which ultimately yielded the idea 
of optimal alarm systems [17], which is used in this study. 
There are still parts of Leadbetter’s original theoretical con- 
structs which have gone unused for its originally intended 
target application. 

As such, a future research objective is to marry the largely 
uncultivated portions of Leadbetter’s theory for its intended 
purpose and the results generated by his younger Swedish 
colleagues, enabled by none other than the Kalman filter, 
coming full circle. Therefore, with further development and 
implementation across a broad spectrum of NASA aerospace 
platforms, this activity also has the potential to generate new 
knowledge that has evolved from the results of NASA-based 
legacy programs. 

III. Methodology 

Our underlying assumption is that we can fit measured or 
transformed data to a model represented by a linear dynamic 
system driven by Gaussian noise. The state-space formulation 


is shown in Eqns. 1-3, demonstrating propagation of both the 
state and the covariance matrix with time-invariant parameters. 


xT+i 

= Ax fc + w k 

(1) 

yk 

= Cx fc + v k 

(2) 

P fc+i 

= AP k A"^ + Q 

(3) 


where 

w k ~ A/"(0, Q) 

Vk ~ AT(0,R) 

x 0 ~ 7V(^ X ,P 0 ) 

/^X = 

Pfc P[{pt-k Mx)(Afc /T x ) ] 

The parameters to be learned are specified below, as the 
parameter 6. These parameters are also shown in Fig. 1, which 
specify them in relation to the probabilistic graphical modeling 
paradigm which may be used for machine learning purposes. 

0 = 0TxjP O ) A, C, Q, R) (4) 

The essence of the optimal alarm system is derived from 
the use of the likelihood ratio resulting in the conditional 
inequality: P(Ck\yo, ■ ■ ■ ,yk) > Pb • This basically says “give 
alarm when the conditional probability of the event, Ck, ex- 
ceeds the level /j,.” Here, I\ represents some optimally chosen 
border or threshold probability with respect to a relevant alarm 
system metric. It is necessary to find the alarm regions in 
order to design the alarm system. The event, Ck, can be 
chosen arbitrarily, and is usually defined with respect to a pre- 
specified critical threshold, L, as well as a prediction window, 
d. In this paper, the event of interest is shown in Eqn. 5, and 
represents at least one exceedance outside of the threshold 
envelope specified by [— L, L] of the process yk within the 
specified look-ahead prediction window, d. 

d p'-l j 

C k = {\y k \ > *>u u n \yk+i\ < L, \y k+j \ > L 

3-1 Li=0 J 

(5) 

There are three different alarm systems to compare which 
will all attempt to predict the level-crossing event defined by 
Eqn. 5, whose probability, P(C k ), can be computed according 
to formulae presented in [11], The first alarm system attempts 
to define an envelope, [— La, La], outside of which an alarm 
will activate. In order to provide for a layer of predictive 
capability. La should be chosen such that La < L. An alarm 
probability can likewise be computed, P(A k ) = P(\yk\ > 
La) and the details of this formula are also provided in 
[11], This “redline” alarm system is termed as such in order 
to indicate that a simple level is used, and often the same 
terminology is used in practice. Even without the benefit of 
using any predicted future process values, this alarm system 
would be superior to a true redline system that uses only a 




Fig. 1. Linear Dynamic System 


single level L. However, in this case two levels are used, L 
as the failure threshold, and La as the design threshold. 

The second alarm system incorporates the use of predicted 
future process values, and is called the “predictive” alarm sys- 
tem. This alarm system also defines an envelope, [—La, La], 
outside of which an alarm will sound. Similarly, La should 
be chosen such that La < L in order to provide for a layer 
of predictive capability. However, the alarm probability is 
defined in a different fashion than for the redline method, 
as P(Ak) = P(\yk+d\k\ > La), where the predicted future 
process value y /,■+,/ \k is found from standard Kalman filter 
equations shown in Eqns. 6 - 1 1 by using the definitions below. 

*k\k = E[x k \y 0 ,...,y k \ 

P k\k = E[(x k x fc |fc)(x* X k \ k ) \l/Oi • • • 5 Uk] 


Vk\k = (6) 

^-k-\-l\k Ax/gl/g (7) 

F fc+1 , fc = P fc+ 1 | fc C T (CP fc+1 , fc C t + R)~ 1 ( 8 ) 

Pfc+i|fc = AP fe | fe A T + Q (9) 

Pfe+t|fc+i = Pfc+i|fc — Ffc+ti fc CP fe_|_i|fe (10) 


Eqn. 8 represents the dynamically updated Kalman gain, 
and combining the two equations 9 and 10, we may obtain 
the Riccati equation (Eqn. 11). 

Pfc+i|/c = AP fc | fe _x A 1 — AFfcn._ 1 CPfc|fc_ 1 A T +Q (11) 

The final alarm system to be compared to the previous two 
is the optimal alarm system, and has two approximations, but 
only the one presented as Eqn. 12 will be used for comparison 
in this paper. The alarm condition, P{C k \yo, ■ ■ ■ , Uk) > Pb, 
can be approximated to form the alarm region specified in 
Eqn. 12. 

d 

A k = l^J |?/fc+i|/c| > L + \JVk+i\k*& 1 [Pb) (12) 

i=0 

where 4> _1 (-) represents the inverse cumulative normal stan- 
dard distribution function, and V k+ i\ k = Vw(y k +i\yo, • • • , y k )- 

Eqn. 12 plays a pivotal role in enabling the enforcement of 
the approximation to the alarm region for an optimal alarm 


system. Using this approximation allows it to outperform 
the other alarm systems with respect to the minimization of 
false alarms. All of the three alarm systems described will 
be compared using the area under the ROC curve (AUC). 
This provides a performance metric with which to assess and 
compare the performance of each alarm system. The ROC 
curve parametrically displays the true positive rate against 
the false positive rate. The AUC has been deemed as a 
theoretically valid metric for model selection and algorithmic 
comparison [14]. 

The parameters of interest are La for the redline and pre- 
dictive methods, and Pb for the approximation to the optimal 
alarm system. It is possible to generate formulae for the true 
and false positive rates as a function of these parameters (La, 
Pb) as well as the model parameters (6) by appealing to Eqns. 
13-14. The details for constructing these formulae are provided 
in [ 11 ]. 


True positive rate: 



P(C k \A k ) = 

P{C k ,A k ) 

(13) 

P(A k ) 

False positive rate: 



P{A k \c' k ) = 

P{C' k ,A k ) 

(14) 

P(C' k ) 


IV. Results 


The example to be used for the presentation of our results 
has no specific application, but is generic, and the model 
parameters are provided in Eqns. 15-18. 


A = 

1 

o o 
io 

1 

1.8 

(15) 

C = 

^ A 

0.5 1 
' 0 O' 

] 

(16) 

Q = 

0 1 


(17) 

R = 0.08 


(18) 


Unless otherwise stated, for all three cases to compare: 
redline, predictive, and optimal, the threshold is L = 16, and 
the prediction window is d = 5. Fig. 2 represents the optimal 
alarm region decision boundary for a sample system and two 
level-crossing events that span a prediction window of three 
time steps. The figure shown on the right is of the same form 


that we are investigating in Eqn. 12. Approximations to this 
sort of alarm region are required for the most computationally 
efficient generation of a ROC curve or other similar alarm 
system design metrics. 

Some recent results of computing the AUC as a function 
of the prediction window, d, are shown on the left of Fig. 3. 
We show the AUC for the three methods described thus far 
to be compared. Clearly, the approximations to the optimal 
alarm system outperform the redline and predictive methods, 
for the entire prediction horizon. This figure can also be used 
as a preliminary design step for choice of maximal prediction 
window corresponding to a minimum allowable AUC as the 
criterion for selection. 

For example, if AUC m j„ = 0.95 is set as the minimum 
allowable AUC, the maximal prediction window is obtained 
by using the optimal alarm system, and corresponds to d = 5. 
The final design step will involve choosing the ROC curve 
corresponding to this maximal prediction window. Using this 
ROC curve, a value of I\ can be selected based upon the 
desired tradeoff between true and false positive rates. 

For contrast, shown on the right on Fig. 3 is a plot of 
the prediction variance, V k+ d\ k , and the bounded uncertainty. 
Vk+d\h is a function of model parameters as shown in Eqn. 
19. Taking lim^oo V k+d \ k = CP k C T + R provides the finite 
bound on uncertainty for an infinite prediction horizon, and as 
such represents the maximum uncertainty for predicted future 
process values. The finiteness of the bound is guaranteed only 
if p{ A) < 1, where p(-) is the spectral radius operator. 

V k+d \ k = C [A d (P fc , fe - P fc)(A d ) T + P fc ] C t + R (19) 

Due to the assumption of time-invariance for our model pa- 
rameters, we require the necessary and sufficient conditions of 
controllability or stabilizability of (A, \/Q) and observability 
or detectability of (A, C) in order to obtain a well-defined 
steady-state Kalman filter. The observability condition can 
easily be proven by taking lim^^ E[(x k - x fe )(x fc - x fc ) T ], 
where H k is the estimate of a generic observer. Our time- 
invariance assumption also allows for the optimal alarm system 
to designed off-line, rather than computing P;,.| fc and Pfc and 
re-designing the alarm system at each time step. 

As such, we can the use solution to the discrete algebraic 
Riccati equation, P^, in place of Pfc|fc for Eqn. 19. P^ s is the 
aposteriori steady state covariance, and is a quadratic function 
of the apriori steady state covariance matrix, P^,. Pf s is the 
algebraic counterpart of Eqn. 11. Similarly, P ss , the solution 
to the discrete algebraic Fyapunov equation, can be used in 
place of its counterpart Pfc from Eqn. 3. 

Vfc_|_^| fc therefore requires the solution to the both steady- 
state Riccati and Lyapunov equations, and its bound is de- 
pendent only on the Lyapunov equation, as indicated on the 
legend on the right of Fig. 3. The Riccati solution is inherently 
a conditional covariance matrix by definition of Pfc|fc, and the 
Lyapunov solution is inherently an unconditional covariance 
matrix by definition of Pfc. 


The graph on the right of Fig. 3 allows for us to obtain 
an estimate of the margin to maximum uncertainty for y k +,i\k 
when using the chosen maximal prediction window, d = 5. 
This estimate serves only as a relative indicator of uncertainty 
for y k +d\k- It also serves to contrast the optimal alarm to 
the predictive alarm system, the latter of which does not use 
uncertainty as part of its construction. This is apparent in the 
qualitative oscillations that evolve with increased prediction 
window for both the predictive alarm system on the left of 
Fig. 3, and the prediction variance on the right of Fig. 3. 

V. Future Work 

Because algorithms based upon the optimal alarm system 
concept appeal to data mining and machine learning tech- 
niques, they are clearly viable candidates for extension to 
techniques such as particle filtering. Performing this extension 
will enable event distributions and model parameters to be 
adaptively updated as in [1] rather than making convenient 
Gaussian and stationary assumptions. However with particle 
filtering, the formulation of the problem can involve non- 
Gaussian noise, as well as non-linearities which were not 
covered in [1], 

Furthermore, we want to investigate improved approxima- 
tions that would provide a tighter bound on the alarm regions 
shown in Fig. 2. We will also investigate and compare the 
discrepancy between the error accumulated due to techniques 
studied here, and those due to improved approximations. 
Future development will involve more rigorous testing and 
validation of the alarm systems discussed by using standard 
machine learning techniques and consideration of more com- 
plex, yet practically meaningful critical level-crossing events. 

Finally, a more detailed investigation of model fidelity with 
respect to available data and metrics has been conducted 
[10], As such, future work on modeling will involve the 
investigation of necessary improvements in initialization tech- 
niques and data transformations for a more feasible fit to 
the assumed model structure. Additionally, we will explore 
the integration of physics-based and data-driven methods in a 
Bayesian context, by using a more informative prior. 
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